Text identification for document image analysis using a neural network

نویسندگان

  • Charalambos Strouthopoulos
  • Nikos Papamarkos
چکیده

A new bottom-up method is described that clusters the content of a mixed type document into text or non-text areas. The proposed approach is based on a new set of features combined with a self-organized neural network classifier. The set of features corresponds to the contents and the relationship of 3 3 3 masks, is selected by using a statistical reduction procedure, and provides texture information. Next, a Principal Components Analyzer (PCA) is applied, which results in a reduced number of ‘effective’ features. The final set of features is then utilized as input vector into a proper neural network to achieve the classification goal. The neural network classifier is based on a Kohonen Self Organized Feature Map (SOFM). Document blocks are classified as text, graphics, and halftones or to secondary subclasses corresponding to special cases of the primal classes. The proposed method can identify text regions included in graphics or even overlapped regions, that is, regions that cannot be separated with horizontal and vertical cuts. The performance of the method was extensively tested on a variety of documents with very promising results. q 1998 Elsevier Science B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor

Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems.  In this study, we d...

متن کامل

Porosity classification from thin sections using image analysis and neural networks including shallow and deep learning in Jahrum formation

The porosity within a reservoir rock is a basic parameter for the reservoir characterization. The present paper introduces two intelligent models for identification of the porosity types using image analysis. For this aim, firstly, thirteen geometrical parameters of pores of each image were extracted using the image analysis techniques. The extracted features and their corresponding pore types ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Image Vision Comput.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 1998